专利摘要:
METHOD FOR ESTABLISHING AN IMPARTIAL MODEL AND METHOD FOR SELECTING A PLANT The invention provides methods for the characterization of metabolic profiles, phenotype profiles and trace profiles in plants or groups of plants. In addition, methods for establishing an impartial model between a phenotypic profile and a metabolic profile, or between a trace and metabolic profile, are also provided by the invention. In addition, methods for using such unbiased models to accurately predict the development of a phenotype of interest or a trait of interest, an independent immature plant are also provided. In an embodiment of the invention, an impartial model is established using the phenotypic profiles and metabolic profiles of at least two groups of plants, in which the groups of plants exhibit different phenotypes or are grown under different environmental conditions. Alternatively, an impartial model can be established using the trace profiles and metabolic profiles of at least two groups of plants, in which the groups of plants exhibit different characteristics or are grown under different environmental conditions. In such embodiments, the unbiased models of the invention can be determined using various combinations of partial least squares analysis, the analysis (...).
公开号:BR112013012068B1
申请号:R112013012068-1
申请日:2011-11-16
公开日:2020-12-01
发明作者:Jan Hazebroek
申请人:Pioneer Hi-Bred International, Inc.;
IPC主号:
专利说明:

FIELD OF THE INVENTION
[0001] The invention relates to the field of metabolomics and, more particularly, to the use of metabolome data and statistical analysis to predict phenotypes and characteristics in plants. BACKGROUND OF THE INVENTION
[0002] The agricultural sector is always developing new varieties of plants that are designed to produce high yields under a variety of environmental and adverse conditions. At the same time, the industry also aims to decrease the costs and potential risks associated with traditional approaches, such as fertilizers, herbicides and pesticides. In order to meet these demands, plant breeding techniques have been developed and used to produce plants with desirable phenotypes. Such phenotypes may include, for example, increased crop yield and quality, increased crop tolerance to environmental conditions (eg drought, extreme temperatures), increased crop tolerance to viruses, fungi, bacteria, and parasites, increased tolerance to herbicides, and change in the composition of the resulting crop (eg increased sugar, starch, protein, or oil).
[0003] To produce plants that exhibit a desirable phenotype, a wide variety of ancient (eg, crossbreeding, hybridization) and modern (eg, recombinant DNA technology) techniques can be employed. A crucial step in any of these methodologies is the evaluation of the phenotype and characteristics in the altered plants. Although strategies have been developed to reduce the time and costs required to make these assessments, significant time and cost are still needed to assess crops under different stresses, climatic seasons and environmental conditions. As a result, much effort has been made to increase yield, decrease cost and increase the reliability and accuracy of the evaluation of new plant varieties.
[0004] One approach to evaluating new varieties of plants is to screen their genomes to determine whether they contain genes of interest. This can be achieved using indirect detection methods (for example, selection assisted by molecular markers) or direct (for example, Southern blots or Southern patches), which determine whether a gene of interest is expressed or not in a plant, without having to grow the plant to maturity. However, a disadvantage of this approach is that it requires knowledge of the particular gene of interest and does not necessarily produce a reliable prediction of the plant's phenotype at maturity. Other techniques, such as RNA or protein screening, suffer from similar disadvantages, in which the genes of interest must be known and where the accuracy and precision of predicting the plant's phenotype is relatively low. As a result, the development of techniques that can accurately predict the development of phenotype or characteristics in altered plants, and eliminate the need to produce such plants to maturity, in many simulated conditions, would be particularly advantageous.
[0005] Metabolomics is the systemic study of the complete set of metabolites (that is, the metabolome) found in a biological cell, tissue, organ or organism at a given point in time. In plants, metabolomics allows for a measurement free from the biochemistry of the metabolite that evolves as light energy, water, carbon dioxide and nutrients are converted into biomass within a changing environment. Time scales of this biochemical range from seconds to months, and variability within an organism's metabolome can be regulated by changes in gene expression, stress or changes in the environment. Although efforts have been made to relate the metabolome of a new plant variety to a phenotype or trait of interest, such studies can be challenging and inaccurate.
[0006] Typically, the metabolic profiles of altered and unaltered plants (or plant tissues or organs) should be produced. Such plants may need to be grown to maturity under a variety of environmental conditions or under different types of stress. Metabolic profiles usually consist of named metabolites whose identities may or may not be known. High fidelity in naming and quantifying metabolites is typically slow and laborious. Subsequently, comparisons between the altered and unchanged plant metabolites must be made to determine differences in specific metabolite levels. Quantities of metabolites known among this subgroup are often mapped into specific metabolic pathways. Finally, it is possible to make predictions to determine what effect, if any, the observed differences may have had on the phenotype or trait of interest. Thus, the use of metabolomics to evaluate and predict plant phenotypes can be complex and expensive.
[0007] As such, the development of simple, low-cost methods that are able to accurately relate the metabolome of phenotypes or characteristics in new plant varieties would be extremely beneficial for the agricultural industry. In addition, methods that can accurately predict the development of such phenotypes or characteristics early in a plant's life cycle could be particularly advantageous. In addition, the development of chemometric models that would eliminate the need to develop new varieties of plants under different environmental conditions or in different types of stresses, in order to predict the development of a phenotype or characteristic of interest would also be particularly valuable. BRIEF SUMMARY OF THE INVENTION
[0008] Methods for characterizing metabolic profiles, phenotypic profiles and trace profiles in plants are provided. In addition, methods for establishing an impartial model between a phenotypic profile and a metabolic profile, or between a trace and metabolic profile, are also provided by the invention. Such unbiased models are useful for accurately predicting the development of a phenotype of interest or a trait of interest in an independent immature plant. For example, in an embodiment of the invention, an impartial model is established to identify correlations between the metabolic profile and the phenotypic or characteristic profile of two or more groups of plants. Subsequently, the identified correlations can be used to predict the development of a phenotype or characteristic in an independent plant, where only the metabolic profile has been characterized.
[0009] In an embodiment of the invention, an impartial model is established using the phenotypic profiles and metabolic profiles of at least two groups of plants, in which the groups of plants exhibit different phenotypes or are grown under different environmental conditions. Alternatively, an impartial model can be established using the characteristic profiles and metabolic profiles of at least two groups of plants, in which the groups of plants exhibit different characteristics or are grown under different environmental conditions. In such embodiments, the unbiased models of the invention can be determined using various combinations of partial least squares analysis, discriminant analysis of partial least squares, main component analysis, cross-validation, variable importance for projection calculations, vector machines support and neural networks.
[0010] The metabolic profiles of the plants or groups of plants covered by the invention can be characterized, for example, using chromatography and mass spectrometry techniques. In a particular embodiment, the mass-charge fragments detected by mass spectrometry, which comprise the metabolic profiles of raw materials of the invention, are not identified, characterized or otherwise biased before statistical analysis. Only signal, alignment, baseline correction and normalization pre-processing steps are performed. Thus, the metabolic profiles of the invention comprise the entire set of metabolites, which are detected and pre-processed.
[0011] Methods are also provided to predict the development of a phenotype or trait of interest in an independent plant, which was not used to establish the biased model of the invention. In one embodiment, the unbiased models of the invention are applied to the metabolic profile of an independent, immature plant in order to predict the development of a phenotype or trait of interest in the plant. In another example, immature plants are selected based on their predicted development of a phenotype or trait of interest.
[0012] The following embodiments are covered by the present invention: 1. A method for establishing an impartial model using the metabolic profile and phenotypic profile of at least two groups of plants, comprising said method: a) characterizing the phenotypic profiles of said at least two groups of plants, in which said at least two groups of plants have different phenotypes, or in which said at least two groups of plants are grown under different environmental conditions; b) extracting the metabolites from said at least two groups of plants; c) separating said metabolites by chromatography to generate a first set of data; d) detecting the mass-to-charge fragments produced by said metabolites using mass spectrometry to generate a second set of data; e) pre-processing of said first data set and said second data set to align, reduce noise and dimensionalize, and normalize; f) using the pre-processed data from step (e), to build a multivariate calibration of the partial minimum squares to predict quantitative results; g) using validation or cross-validation to select latent variables and, h) providing an exit for a user of said impartial model. 2. The method of embodiment 1, further comprising predicting a phenotype of a plant, said prediction comprising: a) determining the metabolic profile of at least one independent plant, in which said at least one plant independent is not mature and, b) use said model of exempt embodiment 1 and said metabolic profile of said at least one independent plant to predict the expression of said phenotype in said at least one independent plant. 3. A method for selecting a plant that is predisposed to express a phenotype of interest, said method comprising: a) using the method of embodiment 2 to predict the expression of said phenotype of interest in said at least one independent plant and , b) select said at least one independent plant, which is expected to express said phenotype of interest. 4. The method of embodiment 3, wherein said at least one independent plant comprises at least one transgene. 5. The method of any of the previous embodiments, in which the said method of constructing multivariate calibrations of partial minimum squares further comprises the use of discriminant analysis of partial minimum squares. 6. The method of any of the previous embodiments, in which outliers in that impartial model are identified through the analysis of the main component and cross-validation. 7. The method of any of embodiments 1-4, wherein said impartial model is established using support vector machines. 8. The method of any of the embodiments 1-4, in which said impartial model is established through neural networks. 9. The method of any of the previous embodiments, in which the variable importance for the projection calculations is used to estimate the importance of said metabolites in said impartial model. 10. The method of any of the previous embodiments, wherein the separation of said metabolites by chromatography is carried out using gas chromatography. 11. The method of any of the previous embodiments, in which said metabolites are detected by mass spectrometry using a time-of-flight mass spectrometer. 12. The method of embodiment 11, wherein said pre-processing of said first data set and said second data set to reduce noise and dimensionality comprises: a) fitting the mass-load fragments to a grid common time; b) reduce noise and dimensionality using statistical analysis, in which the said statistical analysis includes smoothing, noise subtraction or limit; c) align the retention times of mass-load fragments or retention indices using a local displacement function; d) filter retention times x mass-load fragment or index combinations using threshold consistency functions; e) normalize said mass-load fragment x retention times or index intensities for internal standard mass-load intensity and weight of dry sample. 13. The method of embodiment 12, which further comprises the steps of: a) establishing the specific retention time or retention index windows; b) determining a correlation between said mass-load fragments identified within said specific retention time or retention index windows; c) calculate a Pearson correlation coefficient matrix for said mass-charge fragments; d) group said mass-charge fragments using a method of agglomeration of the nearest neighbor K, in which groupings are made when a calculated neighboring distance is less than 1, and in which said cluster needs more than 5 mass-charge fragments; e) eliminating mass-charge fragments that are not within said calculated neighboring distance from said cluster; and, f) selecting said mass-charge fragments that have a higher frequency of a maximum within each of said sets to represent each said set in said impartial model. 14. The method of any of the previous embodiments, wherein said at least two groups of plants are grown under precision growing conditions. 15. The method of any of the previous embodiments, wherein said at least one independent plant is grown under precision growing conditions or under natural conditions. 16. The method of any of the previous embodiments, wherein said at least two groups of plants have the same genetic background. 17. The method of any of the previous embodiments, wherein said at least one independent plant has the same genetic basis, as do said at least two groups of plants. 18. The method of any of embodiments 116, wherein said at least one independent plant has a different genetic base than said at least two groups of plants. 19. The method of any of the previous embodiments, wherein said at least one independent plant is grown under the same environmental conditions, as are said at least two groups of plants. 20. The method of any of embodiments 1-18, wherein said at least one independent plant is grown under different environmental conditions, as well as said at least two groups of plants. 21. The method of any of the previous embodiments, wherein said at least one independent plant is grown at the same time as said at least two groups of plants. 22. The method of any of embodiments 120, wherein said at least one independent plant is grown at a different time than said, at least two groups of plants. 23. The method of any of the previous embodiments, wherein said at least one independent plant is grown in the same location as said at least two groups of plants. 24. The method of any of the embodiments 122, wherein said at least one independent plant is grown in a different location than said, at least two groups of plants. 25. The method of any of the previous embodiments, in which said phenotypes other than said, at least two groups of plants are selected from the group consisting of plant growth, total plant area, biomass, dry weight of the aerial part, productivity, drag yield, nitrogen utilization efficiency, water use efficiency, pest resistance, disease resistance, transgene effects, response to chemical treatment, stress tolerance, gas exchange parameters, days to spike, days to fall, germination rate, relative maturity, lodging, ear height, flowering time, emergence stress rate, leaf senescence rate, canopy photosynthesis rate, silk appearance rate, spike anthesis interval and parental percentage recurrent. 26. The method of any of the previous embodiments, in which said different environmental conditions under which said at least two groups of plants are grown, are selected from the group consisting of temperature, soil moisture, nitrogen level, pressure of insects, disease pressure, soil type, pesticide treatment, herbicide treatment, day length, planting density, light intensity, light quality, no-till practice, planting day, carbon dioxide levels and levels of oxygen. 27. The method of any of the foregoing embodiments, wherein said at least two groups of plants, or said at least one independent plant, are monocots or dicots. 28. The embodiment method 27, wherein said monocots or dicots are corn, rice, barley, oats, millet, wheat, grass, soy, cotton, sunflower, safflower, Arabidopsis, tobacco, rapeseed, sugar cane, alfalfa, canola, clover, tomato, potato, cassava or sorghum. 29. A method for establishing an impartial model using the metabolic profile and characteristic profile of at least two groups of plants, comprising said method: a) characterizing the characteristic profiles of said at least two groups of plants, in which said at least two groups of plants have different characteristics, or in which said at least two groups of plants are grown under different environmental conditions; b) extracting the metabolites from said at least two groups of plants; c) separating said metabolites by chromatography to generate a first set of data; d) detecting the mass-charge fragments produced by said metabolites using mass spectrometry to generate a second set of data; e) pre-processing said first data set and said second data set to align, reduce noise and dimensionalize and normalize; f) use the pre-processed data from step (e) to construct a multivariate calibration of the partial minimum squares to predict quantitative results; g) use validation or cross-validation to select latent variables and, h) provide an exit for a user of said impartial model. 30. The method of embodiment 29, which further comprises predicting a characteristic in a plant, wherein said forecast comprises: a) determining the metabolic profile of at least one independent plant, in which it dictates at least an independent plant is not mature and, b) using said model exempt from embodiment 29 and said metabolic profile of said at least one independent plant to predict the expression of said characteristic in said at least one independent plant. 31. A method for selecting a plant that is predisposed to express a characteristic of interest, comprising said method: a) using the method of embodiment 30 to predict the expression of said characteristic of interest in said at least one independent plant and , b) select said at least one independent plant, which is expected to express said feature of interest. 32. The method of embodiment 31, wherein said at least one independent plant comprises at least one transgene. 33. The method of any of embodiments 29-32, wherein said method of constructing multivariate calibrations of partial least squares further comprises the use of discriminant analysis of partial least squares. 34. The method of any of the embodiments 29-33, in which outliers in said unbiased outlier model are identified through principal component analysis and cross-validation. 35. The method of any of embodiments 29-32, wherein said impartial model is established using vector machines. 36. The method of any of the embodiments 29-32, in which said impartial model is established through neural networks. 37. The method of any of the embodiments 29-36, in which variable importance for the projection calculations is used to estimate the importance of said metabolites in said impartial model. 38. The method of any of embodiments 29-37, wherein the separation of said metabolites by chromatography is carried out using gas chromatography. 39. The method of any of embodiments 29-38, wherein said metabolites are detected by mass spectrometry using a time-of-flight mass spectrometer. 40. The method of embodiment 39, wherein said pre-processing of said first data set and said second data set to reduce noise and dimensionality comprises: a) fitting the mass-load fragments to a common time grid; b) reduce noise and dimensionality using statistical analysis, in which the said statistical analysis includes smoothing, noise subtraction or limit; c) align the retention times or retention indices, using a correlation based alignment function; d) filtering mass-load fragments of x time retention or index combinations using threshold and threshold consistency functions, and e) normalizing said mass-load fragment x retention times or index intensities to internal standard mass-load intensity and dry sample weight. 41. The method of embodiment 40, which further comprises the steps of: a) establishing the specific retention time or retention index windows; b) determining a correlation between said mass-charge fragments identified within said specific retention time window; c) calculate a Pearson correlation coefficient matrix for said mass-charge fragments; d) group said mass-charge fragments using a method of agglomeration of the nearest neighbor K, in which groupings are made when a calculated neighbor distance is less than 1, and in which said clusters require more than 5 fragments of mass-charge; e) eliminate mass-charge fragments that are not within said calculated neighboring distance from said cluster and, f) select said mass-charge fragments that have a higher frequency being a maximum within each of said sets to represent each said set in said impartial model. 42. The method of any of embodiments 29-41, wherein said at least two groups of plants are grown under precision growing conditions. 43. The method of any of embodiments 29-42, wherein said at least one independent plant is grown under precision growing conditions or under natural conditions. 44. The method of any of embodiments 29-43, wherein said at least two groups of plants have the same genetic basis. 45. The method of any of embodiments 29-44, wherein said at least one independent plant has the same genetic basis, as do said at least two groups of plants. 46. The method of any of embodiments 29-44, wherein said at least one independent plant has a different genetic base than said at least two groups of plants. 47. The method of any of embodiments 29-46, wherein said at least one independent plant is grown under the same environmental conditions, as are said at least two groups of plants. 48. The method of any of embodiments 29-46, wherein said at least one independent plant is grown under different environmental conditions, as well as said at least two groups of plants. 49. The method of any of embodiments 29-48, wherein said at least one independent plant is grown at the same time as said at least two groups of plants. 50. The method of any of embodiments 29-48, wherein said at least one independent plant is grown at a different time than said, at least two groups of plants. 51. The method of any of embodiments 29-50, wherein said at least one independent plant is grown in the same location as said at least two groups of plants. 52. The method of any of embodiments 29-50, wherein said at least one independent plant is grown in a different location than said, at least two groups of plants. 53. The method of any of the embodiments 29-52, in which said characteristics different from said, at least two groups of plants are selected from the group consisting of leaf angle, crown width, ear-width of the leaf width , grain dispersion, root mass, stem strength, seed moisture, greensnap, breakage, visual pigment accumulation, grains per ear, ears per plant, seed size, kernel density, leaf nitrogen content and nitrogen from the grains. 54. The method of any of embodiments 29-53, in which said different environmental conditions under which said at least two groups of plants are grown, are selected from the group consisting of temperature, soil moisture, nitrogen level, insect pressure, disease pressure, soil type, pesticide treatment, herbicide treatment, day length, planting density, light intensity, light quality, no-till practice, planting day, carbon dioxide levels and oxygen levels. 55. The method of any of embodiments 29-54, wherein said at least two groups of plants, or said at least one independent plant, are monocots or dicots. 56. The embodiment method 55, wherein said monocots or dicots are corn, rice, barley, oats, millet, wheat, grasses, soybeans, cotton, sunflower, safflower, Arabidopsis, tobacco, rapeseed, sugar cane, alfalfa, canola, clover, tomato, potato, cassava or sorghum. BRIEF DESCRIPTION OF THE DIVERSE VIEWS OF THE DRAWINGS
[0013] Figure 1 shows the dry shoot versus predicted dry shoot of pure maize strains grown under normal nitrogen conditions.
[0014] Figure 2 shows the dry shoot versus predicted dry shoot of pure maize strains grown under low nitrogen conditions.
[0015] Figure 3 shows the dry aerial part of pure low nitrogen maize strains versus the dry aerial part of pure strains, predicted by the partial analysis of the minimum squares and cross-validation using the metabolome of the strains that received normal nitrogen.
[0016] Figure 4 presents the PLS model based on metabolomics of the specific predicted genotype of the dry shoot between nitrogen-deprived plants and those that receive enough nitrogen plotted against the measured dry shoot.
[0017] Figure 5 presents the modeling of metabolic changes produced by water stress in a whole range of genotypes and environments.
[0018] Figure 6 shows the predicted class of events of the transgene that was statistically separated from the null segregants in the predicted direction using the well watered metabolome. DETAILED DESCRIPTION OF THE INVENTION
[0019] The present invention will now be described more fully below with reference to the accompanying drawings, in which some, but not all of the embodiments of the invention are shown. Indeed, these inventions can be realized in many different ways and should not be interpreted as limited to the embodiments defined herein, rather, these embodiments are provided in order for this description to satisfy applicable legal requirements.
[0020] Many modifications and other embodiments of the inventions established herein will come to the mind of a technician in the art to which these inventions belong, having the benefit of the teachings presented in the previous descriptions and in the associated drawings. Therefore, it is to be understood that the inventions should not be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are used here, they are used in a generic and descriptive sense and not for the purpose of limitation. I. Overview
[0021] The invention provides methods for the characterization of metabolic profiles, phenotypic profiles and characteristic profiles in plants or groups of plants. In addition, methods for establishing an impartial model between a phenotypic profile and a metabolic profile, or between a characteristic and metabolic profile, are also provided by the invention. In addition, methods for using such unbiased models to accurately predict the development of a phenotype of interest or a trait of interest in an immature independent plant are also provided.
[0022] In an embodiment of the invention, an impartial model is established using phenotypic profiles and metabolic profiles of at least two groups of plants, in which the groups of plants exhibit different phenotypes or are grown under different environmental conditions. Alternatively, an impartial model can be established using the characteristic profiles and metabolic profiles of at least two groups of plants, in which the groups of plants exhibit different characteristics or are grown under different environmental conditions. In such embodiments, the unbiased models of the invention can be determined using various combinations of partial least squares analysis, discriminant analysis of partial least squares, main component analysis, cross-validation, variable importance for projection calculations, vector machines support and neural networks. The metabolic profiles of the plants or groups of plants covered by the invention can be characterized, for example, using mass spectroscopy and chromatography techniques. In a particular embodiment, the mass-charge fragments detected by mass spectrometry, which comprise the metabolic profiles of the invention, are not identified, or otherwise characterized before statistical analysis. Only signal deletion, alignment, baseline correction and normalization preprocessing steps are performed. Thus, the metabolic profiles of the invention comprise the entire set of metabolites, which are detected and pre-processed.
[0023] Methods are also provided to predict the development of a phenotype or characteristic of interest in a plant that was not used to establish the impartial model of the invention, that is, in an independent plant. While the independent plant can be a plant at any stage of development, in one embodiment, the impartial models of the invention are applied to the metabolic profile of an immature independent plant, in order to predict the development of a phenotype or characteristic of interest, in a plant. In another embodiment, immature plants are selected for use based on their expected development of a phenotype or characteristic of interest. II. Analytical Techniques for Characterizing the Metabolic Profile, Phenotypic Profiles and Characteristic Profiles in a plant or groups of plants
[0024] Methods of the invention provide means for the characterization of metabolic profiles, phenotypic profiles and characteristic profiles of a plant or group of plants. Embodiments of the invention encompass the use of such profiles to establish unbiased models for predicting a phenotype or characteristic of interest in an independent immature plant.
[0025] As used herein, the terms "metabolic profile" and "metabolome" are intended to mean the collection of metabolites detected in a sample taken from a plant. The term "metabolite" is intended to mean a compound that is produced within an organism due to any process of anabolism or catabolism. The compound is naturally occurring or can be induced by expression of the transgene. The term "phenotypic profile" is intended to mean the measurable characteristics of a plant that relate to a particular function of the plant. Likewise, the term "characteristic profile" is intended to mean the measurable characteristics of a plant that contribute to a particular phenotype of interest. Examples of such features of interest and phenotypes of interest are further described here below.
[0026] The term "characterize" is intended to mean the use of analytical methods to collectively describe the components that comprise a profile. In the case of a metabolic profile "characterizing" means the complete description of the metabolites of a sample made from a plant. In the case of a phenotypic profile "characterizing" means the complete description of the measurable characteristics of the plant that relate to a particular plant function. In the case of a characteristic profile, “characterize” means the complete description of the measurable characteristics of a plant that contribute to a particular phenotype of interest.
[0027] In an embodiment of the invention, the metabolic profile of a plant is characterized by extracting the metabolites from a sample obtained from a plant, or cell, or part of the plant and detecting such metabolites by various analytical methods. As used herein, the terms "extract" or "extracted" are intended to mean any methods that allow the isolation of analytes of interest (ie, metabolites) from a sample matrix, or a sample derived therefrom. The term "extraction", or derivation thereof, does not necessarily refer to the removal of all materials or components, except the analytes of interest from a sample matrix, or a sample derived from it. Instead, in some embodiments, the term "extraction" refers to a process that enriches the amount of one or more analytes of interest in relation to one or more other components present in the sample matrix or in a derived sample. In other embodiments, an "extraction" process can be used to remove one or more components from a sample that may interfere with the detection of the analyte. For example, these components can be those that interfere with the detection of an ion analyte by mass spectrometry. In yet other embodiments, the extraction procedure is used to remove the analytes of interest from the test sample matrix. Various extraction techniques can be used to extract and purify analytes of interest from a sample, and the selection of suitable techniques for extracting analytes of interest from specific plants, cells or parts of plants would be known to an expert in art. In a particular embodiment of the invention, the analytes of interest are extracted from a sample using the techniques described in the examples provided here below.
[0028] The invention further provides methods for separating analytes of interest in an extracted sample, wherein such separation of analytes of interest facilitates their detection. In one embodiment, the separation of the analytes of interest comprises chromatographic separation. As used herein, "chromatographic separation" employs an "analytical column" or "chromatography column" having sufficient chromatographic plates to separate the components of a test sample matrix. Preferably, the components eluted from the analytical column are separated in such a way as to allow the presence or quantity of an analyte (s) of interest to be determined. “Analytical columns” can be distinguished from “extraction columns”, which are typically used to purify or extract accumulated material from - accumulated materials to obtain a “purified” sample for further analysis or purification.
[0029] In particular embodiments of the invention, the analytes of interest are chromatographically separated from each other to facilitate their detection. In such embodiments, the chromatographic separation of the analytes of interest includes: (a) eliminating the composition comprising the analyte (s) extracted for analysis in an analytical column, and (b) eluting the analyte (s) from the analytical column . Suitable chromatography methods include, but are not limited to, high performance liquid chromatography (HPLC), gas chromatography (GC), reverse phase HPLC, ion exchange HPLC, gel permeation chromatography, capillary electrophoresis, electrophoresis, chromatography thin layer, chip-based micro-fluidic separation, affinity interaction chromatography using antibodies or other specific binding domains of the ligand. It is recognized that, depending on the detection method employed, this may not be necessary to separate each analyte of interest from one another by chromatography. Such detection methods allow each analyte to be detected when present as a mixture.
[0030] In an embodiment of the invention, the chromatographic separation of the analytes of interest in a sample comprises the use of a gas chromatograph and a GC column. Gas chromatographs normally comprise a GC column and column inlet, which is used to introduce a sample into the GC column. Various GC columns can be used in the methods of the invention, including, but not limited to, packaged columns, capillary columns, internally heated Microfast columns and micro packed columns. Any GC column that can sufficiently resolve the analytes of interest and allow their detection and / or quantification can be employed, and such columns would be known to those skilled in the art. In a particular embodiment of the invention, the analytes of interest are prepared by separation on a GC column as described in the Examples presented below. The data generated by the separation methods described here are considered to be the “first data set.”
[0031] The invention further provides methods for detecting the presence of the analytes of interest in an extracted sample. In an embodiment of the invention, the analytes of interest are detected after chromatographic separation using any of a number of analytical instruments including, but not limited to, nuclear magnetic resonance imaging (NMR) devices, mass spectrometers (MS ), electrochemical matrices (EC), and / or their combinations. As used herein, "detect" or "detected" is defined as the determination of the presence or quantity of an analyte of interest in a test sample. The detection method is not restricted and can be qualitative or quantitative.
[0032] In another such embodiment of the invention, the detection of analytes of interest comprises analyzing the analytes separated chromatographically using mass spectrometry. As used herein, the terms "mass spectrometry", or "MS" generally refer to methods of filtration, detection and measurement of ions based on their mass-charge ratio, "m / z". In MS techniques, one or more molecules of interest are ionized, and the ions are subsequently introduced into a mass spectrography instrument (ie, a mass spectrometer), where, due to a combination of electric and magnetic fields, the ions they follow a path in space that is dependent on their mass ("m") and charge ("z"). See, for example, U.S. Patent No. 6,107,623, entitled Methods and Apparatus for Tandem Mass Spectrometry ”, which is incorporated herein by reference in its entirety.
[0033] Mass spectrometers that can be used in the methods of the invention typically comprise three components: an ionization source, a mass analyzer and a detector. Ionization methods that may be suitable for use in the methods of the invention include, but are not limited to, chemical ionization, electron ionization, inductively coupled plasma, luminescent discharge, field desorption, rapid atomic bombardment, chemical ionization at atmospheric pressure, ionization caused and thermal ionization. Types of mass analyzers that may be useful in the methods of the invention can include, but are not limited to, sector, quadrupole, quadrupole ion trap, linear quadrupole ion trap, cyclotron ion resonance with Fourier transformation, trap orbit and flight-time. Detectors that can be used in the methods of the invention can include, but are not limited to, electron multipliers or secondary emission multipliers.
[0034] In a particular embodiment of the invention, a time-of-flight mass analyzer (ToF) can be used in conjunction with a gas chromatograph, the ionization source and the gas detector to detect the derived ions analytes of interest. As used herein, a “time-of-flight mass analyzer” is considered to be a specific type of mass analyzer, in which ions are introduced from an ionization source and are accelerated by an electric force field known. The accelerated ions are introduced into a field-free floating region when they travel to a detector, which is located at the distal end of the drift region. Ions will separate in the mass analyzer according to their mass-to-charge ratio (m / z), such that the heavier ions will travel more slowly than the lighter ions. Such separation results in different arrival times at the detector, where the transit time of each ion is recorded. Data generated by the detection methods described here are considered to be the “second data set”. As used in the context of mass spectrometry analysis, “data” and “data set” mean individual measurements, or the collection of measurements, which are recorded by a detector after separation of metabolite ions derived from a mass analyzer .
[0035] It is recognized that several methods can be used to increase the resolution of the signals produced by a ToF mass analyzer, including, but not limited to, delayed extraction, ion propagation or orthogonal acceleration. Additional methods that can be used to improve the resolution of ToF mass spectrometry include Hadamard transformation ToF mass spectrometry, tandem ToF / ToF mass spectrometry, or the use of a reflectron. Various types of recorders can be used with a ToF mass spectrometer to record electrical signals from the detector, including, but not limited to, fast time-to-digital converters or analog-to-digital converters.
[0036] Methods of the invention also provide means for characterizing phenotypic profiles or profiles of characteristics of plants or groups of plants. Such phenotypes and characteristics can be evaluated in plants or groups of plants using any of a series of tests and techniques that would be known to a person with ordinary knowledge in the art. Embodiments of the invention include the use of techniques and analyzes that detect changes in various plant characteristics, including, but not limited to, chemical composition, morphology, biomass or physiological responses to stress conditions. In addition, the physiological properties of the altered plants of the present invention can be identified by assessing responses to stress conditions, for example, in tests using imposed stress conditions to detect improved responses to stress, nitrogen deficiency, cold growing conditions or in heat, pathogen or insect attack or light deficiency, or, alternatively, under naturally present conditions of stress, for example, in field conditions. Altered chemical compositions, such as nutritional grain composition, can be detected by analyzing, for example, the composition and protein content of the seed, free amino acids, oil, free fatty acids, starch or tocopherols. Biomass measurements can be made in greenhouse plants or in plants grown in the field and may include measurements such as plant height, stem diameter, root and dry weight of the sprout, dry matter partitioning between the different organs of the plant and, for plants corn, flowering behavior, ear length and ear diameter.
[0037] Embodiments of the invention also provide methods for the collection of phenotypic data and data of characteristics in morphological changes by visual observation. Such phenotypic and characteristic data may include, but are not limited to, characteristics such as normal plants, shrub plants, taller plants, thicker stems, narrow leaves and striped leaves, knotty phenotype, chlorosis, albino, anthocyanin production, or edges , altered ears or roots. Other altered phenotypes and characteristics can be identified by measures taken in field conditions, such as the day to release pollen, days to flower, leaf extension rate, chlorophyll content, leaf temperature, support, seedling vigor, length of internodes, plant height, number of leaves, leaf area, tillering, root brace, green stay, stem accommodation, lodging, plant health, sterility / prolificacy, green pressure and pest resistance. In addition, trait traits and phenotypic characteristics of harvested grains can be assessed, including the number of grains per row on the cob, number of rows of cob grains, grain abortion, grain weight, core size, core density and quality grain physics.
[0038] In particular embodiments of the invention, visual observation of plant phenotypes and characteristics can also be obtained using an automated system. In such an embodiment, the method involves visual observation of plants growing in a controlled greenhouse environment, transferring the sometimes selected plants to an image analysis area where a quantitative, non-destructive, digital light spectrum image analyzer , preferably with an instrumental variation below 5%, has reflected light images of the plant. The analyzer then analyzes the images to determine a value for a parameter or phenotypic characteristic of interest to the plant. Such an automated system is described in U.S. Patent Application No. 11 / 669,377, which is incorporated herein by reference in its entirety. III. Establishing an impartial model
[0039] The methods of the invention provide to establish an impartial model between at least two groups of plants that can be applied to the metabolic profile of an independent plant to predict a phenotype of interest in the independent plant. The independent plant can be at any stage of development, including an immature plant. In addition, the metabolic profile of the independent plant can be characterized at the same age or stage of development as that of the plants used to establish the impartial model of the invention or at a different stage, just like the plants used to establish the model.
[0040] In one embodiment, the pre-processing steps are used to reduce noise and dimensionality of the chromatography data (first data set) and mass spectrometry data (second data set) before establishing unbiased models of invention. The method of the invention encompasses predictive multivariate models combined with highly replicated experiments, thus, the pre-processing steps advantageously reduce the noise and dimensionality of large data sets. As used herein, “pretreatment” of data sets means applying statistical analysis to raw data in order to reduce noise and dimensionality of the data, as well as reducing the weighting potential of the data against the specific metabolites that may produce many mass-load signals. The term "dimensionality" refers to the number of variables under consideration for a data set. The term “noise” refers to the presence of any signal in the data set other than the signals that are desired for analysis. As used in the context of mass spectroscopy analysis, "noise" means the signal based on abundant inconsistent chemistry and the signal based on electronics. “Reducing noise and dimensionality” means a signal filtering process and statistical techniques to reduce the number of variables in the data sets and improve the signal-to-noise ratio of the data.
[0041] Such reduction in noise and dimensionality is advantageous, as the data sets of the invention comprise a large number of values, each metabolite can produce more than one mass-charge fragment value, when detected by GC / analysis ToF. In an embodiment of the invention, pre-processing involves assigning data set values to a common time network and using a first set of signal filtering and statistical techniques to reduce noise and dimensionality of the data. Such techniques may include, but are not limited to, smoothing, noise subtraction, threshold and retention time or retention index alignment. As used herein, “smoothing” describes statistical techniques that create an approximation function that attempts to capture important patterns of data sets, leaving out noise or other fine-scale structures and / or transient phenomena. “Smoothing” can also refer to the smoothing of the chromatogram. By "threshold" it is intended to designate a minimum value that a signal, detected by mass spectrometry analysis, must reach to be included in the analysis. “Retention time alignment” means applying an alignment function based on local displacement to the data set using the first chromatogram of the data set as a retention time or retention index alignment reference. Such alignment functions can include, but are not limited to, optimal correlation strain, dynamic time strain and parametric time strain. Subsequent steps for preprocessing the data may also include mass-load fragment filtering versus retention time or index combinations using threshold and consistency functions. In addition, mass-load fragments x retention times or index intensities can be normalized to an internal standard mass-load intensity and the dry weight of the sample. The analysis of data sets using such techniques would be within the capacity of a person of ordinary skill in the art.
[0042] One or more statistical analyzes can be used to pre-process the data in order to reduce noise and dimensionality. Such analyzes may include, but are not limited to, any combination of the allocation retention time windows for the data sets derived from the mass spectrometry analysis, Pearson's correlation coefficient calculation for the mass-charge fragments within a retention time or retention index windows, grouping of such mass-load fragments and choosing the mass-load fragments with the highest frequency being a maximum within each of the said sets to represent the fragments that cluster in further analysis.
[0043] "Retention time windows" refers to the specific time windows during the mass spectroscopy analysis process during which ion detection data is collected. These retention time windows can start with the last observed retention time in the data set and a time interval of about 0.1-1.0 second. Retention index windows can also be established within the data sets of the invention. A “Pearson's correlation coefficient matrix” is used to describe a statistical method, through which mass-charge fragments within the retention time or retention index windows are correlated to each other. A Pearson correlation is generally used to find a correlation between at least two continuous variables. The value for a Pearson correlation coefficient can be between -1.0 (that is, a perfect inverse correlation) and 1.0 (that is, a perfect correlation), where a value of 0.0 indicates that there is no correlation between variables. A Pearson correlation coefficient matrix is calculated for all ions within the retention time or retention index windows.
[0044] As used herein, "grouping" means the use of statistical analyzes to assign the data of the invention in subsets (that is, aggregates or clusters), so that the values in the same cluster are similar in some sense. Such analyzes may include, but are not limited to, the nearest neighbor K agglomeration method. As used here, the “nearest neighbor K” agglomerative method describes statistical analyzes in which a value is ranked by a majority vote of its neighbors, with the value to be assigned to the most common class among its closest neighbors K, where K is typically a small, positive integer. In particular embodiments of the invention, sets can be made when the neighboring distance calculated in the Pearson correlation coefficient space defined by the retention time or retention index windows is calculated to be less than 1.0, and where at least five mass-charge fragment signals are within the minimum distance. Such mass-charge fragment signals that are not within the minimum distance of a five-member cluster can be eliminated from the data set. In other embodiments of the invention, the mass-charge fragment signals that have the highest frequency being the maximum in each of the calculated clusters can be selected to represent those that clump together in all samples in the data set.
[0045] The invention also provides methods for performing an impartial multivariate analysis of pre-processed data sets to establish an unbiased model, which refers to metabolomic data for characteristic or phenotypic data. As used herein, “impartial” means that the metabolite data obtained by analysis by mass spectrometry is not characterized, or otherwise identified or directed to specific metabolites or metabolic processes prior to statistical analysis. It is also recognized that, in certain embodiments, the impartial model can be an impartial chemometric model. “Multivariate analysis” is intended to mean the use of any one of a number of statistical analyzes, which are known to experts in the field, for data analysis, which arises from more than one variable. Such techniques allow the establishment of an impartial model using the data sets produced by the methods of the invention.
[0046] In particular embodiments of the invention, multivariate analyzes used to establish unbiased models may include, but are not limited to, partial minimum squares analysis (PLS), discriminating partial minimum squares analysis (PLSDA), principal component analysis ( PCA), latent variable techniques, cross-validation techniques, support vector machines or neural networks.
[0047] As used herein, "less partial square analysis" refers to a statistical analysis known to those skilled in the art that can be used for quantitative predictions of phenotypic results by finding a linear regression model. “Discriminant analysis of less partial squares” means the use of statistical analysis that discriminates between two or more groups that occur naturally. PLSDA is also known to those skilled in the art and can be used in certain embodiments of the invention where qualitative predictions can be expected. In cases where PLS or PLSDA are used in the invention, a number of latent variables are selected by cross-validation. “Latent variables” means those variables that are not directly observed, but are quite inferred (through a mathematical model) from other variables that are observed (measured directly). The number of such latent variables can be determined by “cross-validation”, that is, meaning techniques that assess how the results of a statistical analysis will generalize to an independent data set. The process of determining latent variables using cross-validation is within the skill of technicians in the art.
[0048] Embodiments of the present invention also encompass the identification and exclusion of outliers in the data sets. As used herein, "extreme values" means the rare observations or data points that do not appear to follow the characteristic distribution of the rest of the data. As such, outliers can influence the slope of the regression line and the value of the correlation coefficient. Such outliers can be identified and excluded by statistical methods, including, but not limited to, cross-validation and principal component analysis. “Principal component analysis” means a statistical analysis that transforms a number of variables that may be related to a smaller number of uncorrelated variables, called principal components, in which the first principal components account for maximum variability in the data as possible, and each succeeding component accounts for as much remaining variability as possible. The process of identifying extreme values using cross-validation or principal component analysis is within the skill of a person skilled in the art.
[0049] Other embodiments of the invention provide methods for establishing an unbiased model using support vector machines or neural networks. As used here, “vector machines” describes statistical analyzes that are linear classifier algorithms that determine a limit (ie, a n-dimensional hyperplane), which distinguishes between members of the class. The term "neural network" is intended to mean a real or simulated network (for example, a computer program) composed of numerous highly interconnected, independent artificial neurons that simulate the functions of biological neurons. The process of using support vector machines or neural networks to establish an unbiased model using the data sets of the invention would be within the skill of a person skilled in the art.
[0050] Embodiments of the invention also include methods for determining the importance of particular metabolites in the unbiased model. Such methods may include, but are not limited to, variable importance for projection analysis (VIP). As used here, “VIP analysis” means statistical analysis to determine the value of each of the variables (that is, the points of the metabolic data sets and phenotypic data) in the assembly of the PLS or PLSDA model for both predictors and response. Such VIP analyzes can be applied to any of the methods described here for establishing the impartial model of the invention.
[0051] Methods of the invention also provide for predicting the development of a phenotype of interest or a characteristic of interest in a plant, using an unbiased model. The term "prediction" or "predict", as used here, or, in the narrowest sense, the phrase "predict the development of a phenotype of interest or a characteristic of interest" means that the future expression of a phenotype in a unit is anticipated. This expectation is based on the potential for expression of the said phenotype of interest or characteristic of interest that the plant exhibits at the time the methods of the present invention are applied. As such, the aforementioned point in time is temporally earlier than the point in time corresponding to the future expression of the phenotype of interest or characteristic of interest that is being predicted. The term "predisposed", as used herein, is intended to describe a plant that is genetically or environmentally predetermined to develop a phenotype of interest or a characteristic of interest.
[0052] The method of forecasting a phenotype of interest or a characteristic of interest in a plant may vary, depending on the embodiment of the invention used to establish the impartial model. For example, in one embodiment, the impartial model can allow for a quantitative prediction, in which the analysis of partial least squares can be used to establish a linear correlation between metabolic profiles and phenotypic or characteristic profiles within a training set. Subsequently, the impartial model can then be applied to the metabolic profile of an independent plant, in order to quantitatively predict the development of a phenotype or characteristic.
[0053] In another such embodiment, the impartial model can allow a qualitative forecast, in which PLSDA can be used to establish a correlation between the metabolic profiles and the phenotypic or characteristic profiles within a training set. In PLSDA, each plant or group of plants is assigned to a class, that is, plants that exhibit a phenotype or characteristic and plants that do not. Subsequently, the PLSDA model can then be applied to the metabolic profile of an independent plant, in order to predict which class the plant most closely resembles. In another embodiment, the probability of a plant developing a phenotype or characteristic can be calculated using the PLSDA model, as described in the Examples presented below. IV. Plants and Conditions
[0054] The invention provides methods for the characterization of metabolic profiles, phenotypic profiles and characteristic profiles in two or more groups of plants, in order to establish an impartial model. The methods of the invention also comprise independent, immature plants, whose phenotypes or characteristics are predicted based on the application of the impartial model in its metabolome.
[0055] As described herein, the term "groups of plants" means any set of plants that share at least one common characteristic. Such a common trait may include, but is not limited to, high genetic similarity (for example, a taxonomic unit, pure lineage or hybrid species), a specific mechanism of nitrogen fixation or metamorphosis, the presence of distinct anatomical structures, or the production specific type of commercially important material. One of ordinary skill in the art can readily identify "groups of plants" that would be appropriate for the methods taught by the invention. In one embodiment of the invention, the two or more groups of plants that are used to establish the unbiased model are of the same genotype and grown under precision growing conditions. As used herein, "growing conditions" refers to growing in a greenhouse under controlled light, temperature, water, nutrients and the like.
[0056] As used herein, the term "independent plant" means a plant that was not a member of any of the groups of plants that were used to establish the impartial model. In an embodiment of the invention, the unbiased model of the invention is applied to the metabolome of the independent plant to predict the expression of a phenotype of interest in the independent plant. In such an embodiment, an independent plant can be fully mature when its metabolome is characterized. A plant that is not "mature" or "immature" may include, but is not limited to, plants that are not fully grown or are not ready for collection. Such independent plants can be grown under precise growing conditions or in the natural environment, and can be planted in the same or different locations as the groups of plants that were used to establish the impartial model of the invention. In addition, these independent plants can be grown at the same time or at a different time than the plants that were used to establish the impartial model of the invention. On the other hand, these independent plants may have the same genetic basis, or a different genetic basis from the plants that were used to establish the impartial model of the invention.
[0057] The two or more groups of plants used to establish the impartial model should preferably exhibit different phenotypes or different characteristics, be grown under different environmental conditions, or both. Such different phenotypes or characteristics may be the result of traditional breeding techniques, such as hybridization, crossbreeding, retro-crossing and other techniques known to those skilled in the art. In addition, different phenotypes may be the result of a transgenic event in some or all groups of plants that are used to establish the impartial model. Transgenes of interest that can be used in the invention are further described here below. Phenotypes of interest and characteristics of interest that can be evaluated by the methods of the present invention may include, but are not limited to, plant growth, total plant area, aerial dry biomass, productivity, trawl yield, efficiency of utilization of nitrogen, water use efficiency, pest resistance, disease resistance, transgene effects, response to chemical treatment, stress tolerance, gas exchange parameters, days for silk, days for release, germination rate, relative maturity, lodging , ear height, flowering time, stress onset rate, leaf senescence rate, canopy photosynthesis rate, silk appearance rate, spike interval anthesis, recurrent parental percentage, leaf angle, crown diameter, filler leaf width, grain dispersion, root mass, stem strength, seed moisture, greening, chipping, accumulation of visual pigment, grains per ear, ears per plant, core size, core density, nitrogen content in the leaves and nitrogen content of the grains.
[0058] In other embodiments of the invention, the two or more groups of plants that are used to establish the impartial model can also be grown under different environmental conditions from each other. Such conditions may include natural or man-made conditions, including, but not limited to, temperature, soil moisture, nitrogen level, insect pressure, disease pressure, soil type, pesticide treatment, herbicide treatment, length of day, planting density, light intensity, light quality, no-till practice, planting day, carbon dioxide levels, oxygen levels, nutrient deficiency, as well as the presence of heavy metals (eg pathogens , bacteria, nematodes, fungi, viruses, etc.), organisms (for example, insects) and other conditions commonly known to those skilled in the art that affect plant growth and / or yield.
[0059] Any gene can be evaluated in the methods of the invention. Such an assessment includes the expression of the gene in a plant of interest, as well as the reduction of gene expression in a plant. Genes of interest that can be evaluated in the methods of the invention are a reflection of markets and commercial interests of the people involved in the development of the harvest. General categories of genes of interest include, for example, the genes involved in information, such as zinc fingers, those involved in communication, such as kinases, and those involved in household cleaning, such as heat shock proteins. More specific categories of transgenes, for example, include genes that encode important traits for agronomy, resistance to insects, resistance to diseases, resistance to herbicides, sterility, characteristics of grains and commercial products. Genes of interest generally include those involved in oil, starch, carbohydrates, or nutrient metabolism, as well as those that affect grain size, sucrose load, and the like.
[0060] Agronomically important characteristics, such as oil, starch, and protein content can be genetically modified, in addition to using traditional methods of reproduction. Modifications include increasing the content of oleic acid, saturated and unsaturated oils, increasing levels of lysine and sulfur, providing essential amino acids and also modifying the starch. Modifications of the Hordothionine protein are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389, incorporated herein by reference. Another example is the lysine and / or sulfur-rich seed protein encoded by soybean 2S albumin described in U.S. Patent No. 5,850,016, and barley chymotrypsin inhibitors, described in Williamson et al. (1987) Eur. J. Biochem. 165: 99-106, the disclosures of which are hereby incorporated by reference.
[0061] Derivatives of coding sequences can be done by site-directed mutagenesis to increase the level of pre-selected amino acids in the encoded polypeptide. For example, the gene encoding the barley superior lysine polypeptide (BHL) is obtained from the barley chymotrypsin inhibitor, U.S. Application Serial No. 08 / 740,682, filed November 1, 1996, and WO 98 / 20133, whose disclosures are hereby incorporated by reference. Other proteins include plant proteins rich in methionine, such as from sunflower seeds (Lilley et al. (1989) Proceedings of the World Congress on Vegetable Protein Utilization in Human Foods and Animal Feedstuffs, ed. Applewhite (American Oil Chemists Society, Champaign , Illinois), pp. 497-502, incorporated herein by reference), maize (Pedersen et al. (1986) J. Biol. Chem. 261: 6279; Kirihara et al. (1988) Gene 71: 359; both of which are incorporated herein by reference) and rice (Musumura et al (1989.) Plant Mol Biol 12: 123, incorporated herein by reference). Other genes of agronomic interest that encode latex, Floury 2, growth factors, seed storage factors, and transcription factors.
[0062] Insect resistance genes can encode resistance to pests that have high yield resistance, such as root larva, caterpillar, European corn borer, and the like. Such genes include, for example, toxic protein genes Bacillus thuringiensis (U.S. Patent Nos. 5,366,892, 5,747,450, 5,736,514, 5,723,756, 5,593,881, and Geiser et al (1986) Gene 48: 109), and the like.
[0063] Genes that encode disease resistance characteristics include detoxification genes, such as against fumonosin (U.S. Patent No. 5,792,931), avirulence (avr) and disease resistance (R) genes (Jones et al ( 1994) Science 266: 789, Martin et al (1993) Science 262: 1432; Mindrinos et al (1994) Cell 78: 1089) and the like.
[0064] Herbicide resistance characteristics may include genes that code for resistance to herbicides that act to inhibit the action of acetolactate synthase (ALS), in particular sulfonylurea-type herbicides (for example, the enzyme acetolactate synthase (ALS) containing genetic mutations that lead to this resistance, in particular to S4 and / or Hra mutations), genes that code for resistance to herbicides that act to inhibit the action of glutamine synthetase, such as phosphinothricin or simply (eg, the bar gene ), glyphosate (for example, the EPSPS gene and the GAT gene; see, for example, U.S. Publication No. 20040082770 and WO 03/092360), or other genes known in the art. The bar gene encodes resistance to the herbicide is enough, the nptII gene encodes resistance to the antibiotics kanamycin and geneticin, and the mutants of the ALS gene encode resistance to the herbicide chlorsulfuron.
[0065] Sterility genes can also be encoded in an expression cassette and provide an alternative to physical expenditures. Examples of genes used in such modes include genes preferred by male tissue and genes with male sterility phenotypes such as QM, described in U.S. Patent No. 5,583,210. Other genes include kinases and toxic compounds that code for any male or female gametophytic development.
[0066] The quality of the cereals is reflected in characteristics such as levels and types of oils, saturated and unsaturated, quality and quantity of essential amino acids, and levels of cellulose. In corn, modified hordothionine proteins are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389.
[0067] Commercial characteristics can also be encoded in a gene or genes that can increase, for example, starch for ethanol production, or provide expression of proteins. Another important commercial use of transformed plants is the production of polymers and bioplastics such as those described in U.S. Patent No. 5,602,321. Genes such as β-ketothiolase, PHBase (polyhydroxyburirate synthase) and acetoacetyl-CoA-reductase (see Schubert et al. (1988) J. Bacteriol. 170: 5837-5847) facilitate the expression of polyhydroxyalkanoates (PHAs).
[0068] Exogenous products include enzymes and plant products, as well as other sources, including prokaryotes and other eukaryotes. Such products include enzymes, cofactors, hormones, and the like. The level of proteins, in particular modified proteins that have a better distribution of amino acids to improve the nutrient value of the plant, can be increased. This is achieved by expressing such proteins having improved amino acid content.
[0069] In one embodiment, groups of any species of plant can be used to establish the impartial models of the invention or be independent plant (s) whose phenotype (s) or characteristic (s) are (are) predicted using the impartial models of the invention . As used herein, the term "plant" also includes plant cells, protoplasts, plant tissue cultures from plant cells from which plants can be regenerated, plant calluses, agglomerates, and plant cells that are intact in plants or parts of plants, such as embryos, pollen, eggs, seeds, leaves, flowers, branches, fruits, grains, ears, cobs, bark, stems, roots, root tips, anthers, and so on. Seed is intended to mean the mature seed produced by commercial producers for purposes other than the cultivation or reproduction of species. Progenies, variants and mutants of regenerated plants are also included within the scope of the invention, as long as these parts comprise the introduced polynucleotides.
[0070] Plants that can be used in the methods of the invention include, but are not limited to, monocots and dicots. Examples of plant species of interest include, but are not limited to, corn (Zea mays), Brassica sp. (for example, B. napus, B. rapa, B. juncea), in particular Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum ( Sorghum bicolor, Sorghum vulgare), corn (for example, millet (Pennisetum glaucum), millet (Panicum miliaceum), vulpine millet (Setaria italica), grass (Eleusine coracana), sunflower (Helianthus annuus), safflower (Carthamus tinctorius) , wheat (Triticum aestivum), soy (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanut (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), manioc ( Manihot esculenta), coffee (Coffea spp.), Coconut (Cocos nucifera), pineapple (Ananas comosus), citrus (Citrus spp.), Cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), Avocado (Persea americana), fig (Ficus Casica), guava (Psidium guajava), mango (Mangifera indica), olive tree (Olea europaea), papaya (Carica papaya), cashew nuts (Anacardium occidentale), macadamia (Macadamia integrifolia), almonds (Prunus amygdalus), beets (Beta vulgaris), sugar cane (Saccharum spp.), oats, barley, vegetables, ornamentals and conifers.
[0071] Vegetables of interest include tomatoes (Lycopersicon esculentum), lettuce (for example, Lactuca sativa), beans (Phaseolus vulgaris), butter beans (Phaseolus limensis), peas (Lathyrus spp.), And members of the genus Cucumis like cucumbers (C. sativus), cantaloupe melon (C. cantalupensis) and melon (C. melo). Ornamentals include azalea (Rhododendron spp.), Hydrangea (Hydrangea macrophylla), hibiscus (Hibiscus rosasanensis), roses (Rosa spp.), Tulips (Tulipa spp.), Daffodils (Narcissus spp.), Petunias (Petunia hybrida), carnation ( Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima) and chrysanthemum.
[0072] Conifers of interest that can be employed in the practice of the present invention include, for example, pines such as Loblolly pine (Pinus taeda), cut pine (Pinus elliotii), ponderosa pine (Pinus ponderosa), Pinus contorta ( Pinus contorta) and Monterey pine (Pinus radiata), Douglas-fir (Pseudotsuga menziesii); western hemlock (Tsuga canadensis); Spruce sitka (Picea glauca); redwoods (Sequoia sempervirens); real firs like fir (Abies amabilis) and fir balm (Abies balsamea) and cedars like western red cedar (Thuja plicata) and yellow Alaska cedar (C hamaecyparis nootkatensis). Wooden trees can also be employed, including ash, beech, black beech, linden, birch, black cherry, black walnut, Buckeye, chestnut, cotton, dogwood, elm, hackberry, walnut, holly, locust, magnolia, maple, oak, poplar, red alder, redbud, royal paulownia, sassafras, gum, sycamore, tupelo, willow, yellow poplar.
[0073] In specific embodiments, the plants of the present invention are crop plants (e.g., corn, alfalfa, sunflower, Brassica, soy, cotton, saffron, peanuts, sorghum, wheat, millet, tobacco, etc.). In other embodiments, corn and soy and cane plants are great, and in still other embodiments, corn plants are great.
[0074] Other plants of interest include grain plants that provide seeds of interest, oilseeds and legumes. Seeds of interest include cereal seeds, such as corn, wheat, barley, rice, sorghum, rye etc., plant seed oil includes cotton, soy, safflower, sunflower, Brassica, corn, alfalfa, palm, coconut, etc. . Legumes include beans and peas. Beans include guar, locust bean, fenugreek, soy, garden beans, cowpea, mung, fava beans, fava beans, lentils, chickpeas, etc.
[0075] Other plants of interest, including grasses such as, for example, grasses of the genus Poa, Agrostis, Festuca, Lolium and Zoysia. Additional grasses may come from the subfamily Panicoideae. Grasses may still include, but are not limited to, blue grass (Bouteloua gracilis (HBK) Lag Ex Griffiths.); Buffalo grass (Buchloe dactyloids (Nutt.) Engelm.); red grass (Festuca rubra ssp coastalis.); Red fescue grass (Festuca rubra); Colonial bentgrass (Agrostis tenuis Sibth.); Creeping bentgrass (Agrostis palustris Huds.); Fairway wheatgrass (Agropyron cristatum (L.) Gaertn.); hard fescue (Festuca longifolia Thuill.); Kentucky bluegrass (Poa pratensis L.), perennial ryegrass (Lolium perenne L.); rough bluegrass (Poa trivialis L.); Sideoats grass (Bouteloua curtipendula Michx Torr); Smooth bromegrass (Bromus inermis Leyss); fescue (Festuca arundinacea Schreb.); Annual bluegrass (Poa annua L.), ryegrass (Lolium multiflorum Lam); Redtop (Agrostis alba L.); Japanese grass lawn (Zoysia japonica); bermuda (Cynodon dactylon; Cynodon spp. LC Rich; Cynodon transvaalensis); Seashore paspalum (Paspalum vaginatum Swartz); Zoysiagrass (Zoysia spp. Willd; Zoysia japonica and Z. matrella var. Matrella); Bahia grass (Paspalum notatum Flugge); Carpet (Axonopus affinis Chase); Centipedegrass (Eremochloa ophiuroides Munro Hack.); Kikuyugrass (Pennisetum clandesinum Hochst Ex Chiov); Browntop bent (Agrostis tenuis also known as A. capillaris); Folded velvet (Agrostis canina); perennial ryegrass (Lolium perenne); and St. Augustinegrass (Stenotaphrum secundatum Walt Kuntze), additional grasses of interest include switchgrass (Panicum virgatum).
[0076] The articles “one” and “one” are used here to refer to one or more of one (that is, at least one) of the grammatical objects of the article. For example, “an element” means an element or more.
[0077] All publications and patent applications mentioned in the specification are indicative of the level of those skilled in the art to which this invention belongs. All publications and patent applications are hereby incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
[0078] Although the previous invention has been described in some detail by way of illustration and example for the sake of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims. EXAMPLES Example 1 - Quantitative forecasting of relative plant biomass and efficiency in the use of nitrogen under nitrogen deprivation
[0079] Relative plant biomass and nitrogen utilization efficiency (EUN) were predicted in corn using the methods of the invention. Plant biomass was observed as an indicator of plant productivity, under different treatment conditions. As used herein, "nitrogen utilization efficiency" or "EUN" is defined as the ratio of aerial plant biomass under conditions of low nitrogen content to aerial plant biomass, under normal nitrogen conditions. Relative plant biomass and NUE were predicted using 250 pure lines of corn. Twelve replicates of each strain were grown in normal nitrogen solutions (6.5 mmol of nitrate) or low nitrogen solutions (1.0 mmol of nitrate). Six lots of each group of plants were grown in more than six periods from August to January, and the plants were sampled for metabolomics in the vegetative phase 7. Characterization of plant metabolic and phenotypic profiles was accomplished as described here below. Characterization of plant metabolic profiles and phenotypic profiles was performed as described below. Gas chromatography and time of flight mass spectrometer and methods:
[0080] To characterize the metabolic profile of each plant or group of plants, metabolites were extracted from three discs of lyophilized leaves of about 3 mg of combined dry weight. Five hundred microliters of chloroform: methanol: water (2: 5: 2, v / v / v), containing 0.015 mg of internal ribitol standard were added to each sample for a 1.1 mL polypropylene microtube containing two ball bearings 5/32 ”stainless steel. Samples were homogenized in a 2000 Geno / rinder ball mill, at a setting of 1650 for 1 minute and then rotated at 4 □ C for 30 minutes. Samples were then centrifuged at 1454 g for 15 minutes at 4 ° C. Then, 300μL aliquots were transferred to 1.8 ml of high recovery GC flasks and subsequently evaporated to dryness in a Speed Vac. The dry residues were re-dissolved in 50 μl of 20 mg mL -1 of methoxyamine hydrochloride in pyridine, capped and stirred with a vortex mixer. The samples were incubated on an orbital shaker at 30 ° C for 90 minutes to form methoxyamine derivatives. Eighty microliters of N-methyl-N- (trimethylsilyl) trifluoroacetamide (MSTFA) was added to each sample to form trimethylsilyl derivatives. The MSTFA delivery for individual samples was performed by means of the automatic sampler chromatograph 30 minutes before injection, greatly minimizing the variability of the sample, due to differences in the derivation state.
[0081] Trimethylsilyl derivatives were separated by gas chromatography on a Restek 30 mx 0.25 mm ID x 0.25 μm thick Rtx-MS α 5Sil film column with 10 m of integration column. One microliter injections were made with a 1:10 split ratio using the CTC PAL Combi automatic sampler. An Agilent 6890N chromatograph was programmed to an initial temperature of 80 ° C for 5 minutes, increased to 350 ° C to 18 ° C minute-1 where it was carried out for 2 minutes before being quickly cooled to 80 ° C, in preparation for next race. The injector and transfer line temperatures were 230 ° C and 250 ° C, respectively, and the source temperature was 200 ° C. Helium was used as a carrier gas with a constant flow rate of 1 mL minute -1 maintained by electronic pressure control. Data acquisition was performed on a LECO Pegasus III flight time mass spectrometer with an acquisition rate of 10 spectra per second -1 in the mass range of m / z 45600. An electron beam of 70 eV was used to generate spectra. Voltage detector was approximately 1550-1800 V, depending on the age of the detector. An auto tune instrument for mass calibration using perfluorotributylamine (PFTBA) was used before each GC sequence. Pre-processing of GC / ToFMS material data:
[0082] Expressionist refiner from MS Genedata was used to assemble and align the GC / ToFMS data with feature selection and noise reduction. The first step was to generate and adjust all data to a common time grid. Noise reduction was then performed through smoothing and threshold. The retention times were then aligned using a local shift function. The first chromatogram was used as a retention time alignment reference. The output of this workflow was a table of intensities related to retention times or retention indices and mass-charge ratios representing a molecular fragment of the impact of electrons detected on the mass spectrometer.
[0083] The data was then loaded into the Matlab workspace for further processing. Starting with the last retention time, the correlation between all m / z data points within a 0.5 second retention time window was determined. Within this retention time window, a Pearson correlation coefficient matrix was calculated between all samples. The m / z channels were grouped into groups using the nearest neighbor agglomeration method K. Clusters were made when the calculated neighbor distance was less than 1. One set still required more than five mass-charge fragment channels to be included in the model data. If a signal channel of the mass-charge fragment was not within the minimum distance of a set of five members, it was eliminated from the data table. This process was repeated until all data channels were grouped or deleted on a single basis. After all correlated groups within a retention time window were calculated, the mass-load fragment channel with the highest frequency being the maximum within each sample cluster was selected as well as the intensity for this cluster in all samples. Modeling:
[0084] In modeling, all data was pre-processed by autoscaling, or by dividing each data channel by its standard deviation from the data set followed by significant centering. In each case, multivariate calibrations of partial least squares (PLS) were constructed to predict a quantitative result from the metabolome. Where qualitative predictions were expected, these states were digitally represented as ones and zeros. This practice is commonly referred to as discriminating analysis of partial least squares (PLSDA). In each case, cross-validation or validation was used to select the number of latent variables. In no case did the number of latent variables exceed five, and in the majority it had only two. Outliers were identified through principal component analysis and cross-validation. All modeling was performed using the PLSToolbox from Eigenvector Research. Quantitative forecast of relative plant biomass under nitrogen deprivation:
[0085] As previously described, the efficiency of nitrogen use was tested in 250 pure lines of corn grown in greenhouse pots. Twelve replicates of each strain were fed with normal nitrogen solutions (6.5 mmol of nitrate) or low nitrogen solutions (1.0 mmol of nitrate). Six lots from each group of plants were grown for more than six periods from August to January, and the plants were sampled for metabolomics in vegetative phase 7 to monitor plant productivity, the aerial biomass of each plant was cut, weighed and dried to be weighed again. Plant phenotypes were also monitored using the Lemnatech imaging system. These images were used to calculate the specific growth rate, the total leaf area, and the basic RGB color image analysis.
[0086] In this example, plant productivity was assessed by measuring changes in plant dry weight in response from low or normal nitrogen. Since it was not possible to assign the same normal and low nitrogen plant to determine the productive response, averages for strains treated with normal amounts of nitrogen were used in the calculations.
[0087] Biomass prediction within each treatment group was calculated for the first time using PLS leave-one-out cross validation of phenotypic and metabolomic data for each treated strain. The prediction for each pure strain was made with the strain removed from the calibration by leave-one-out analysis. These predictions are shown in Figures 1 and 2. Figure 1 shows the weight of the dry shoot versus the predicted dry shoot weight of strains grown under normal nitrogen conditions, where the value of R2 = 0.6723 and the root error mean square of cross-validation (RMSECV) = 0.6573. Figure 2 shows the dry weight of the shoot versus the predicted weight of the dry shoot of strains grown under low nitrogen conditions, where the value of R2 = 0.4235 and RMSECV = 0.4607.
[0088] Prediction of shoot dry weight in pure strains treated with low nitrogen was then predicted using cross-validation PLS and average plant metabolome giving certain normal nitrogen levels. Figure 3 shows the dry weight of the aerial part of the lines with low nitrogen versus the dry weight of the predicted aerial part of the pure lines, where the value of R2 = 0.2867 and the RMSECV = 0.5136.
[0089] The EUN forecast was then calculated for each of the strains. As stated, NUE was determined to be the ratio of the aerial biomass of lines treated with low nitrogen to the aerial biomass of lines treated with normal nitrogen. In this example, EUN was predicted for each strain using PLS cross-validation and the phenotypic metabolic data of the strains treated with normal nitrogen. This prediction was compared with the values observed in figure 4, in which the PLS model based on metabolomics predicted specific genotype in relation to dry weight between plants lacking nitrogen and those that received sufficient nitrogen were represented as a function of dry weight. Example 2 - Prediction of the qualitative Class for Transgenes Ranking in response to drought
[0090] A PLSDA classification model was used to predict the qualitative effect of two drought tolerance genes on transgenic corn plants. As shown below, the metabolome can be used to assess the effectiveness of genetic modification by comparing the proximity of the stressed metabolome to a PLSDA classification model constructed between unmodified restricted aquatic plants and unmodified well-irrigated plants. This can be achieved through a dimensionally reduced class forecast. In the PLSDA classification model, each metabolite is weighted according to its ability to separate treatments. The model can then be used to predict the response of transgenically modified plants to stress.
[0091] In the present example, two drought tolerance constructions were tested in a dry greenhouse test. The control plants that were used had independent planting dates for each of the buildings. Seeds of transgenic plants were obtained from the first segregated ear generated from transformation seeds. Fifteen of the null segregants and fifteen of the positive segregants were each grown with sufficient water (well watered) or restricted water. A PLSDA model was built with the 20 best weights predicted by ranking the metabolites in the null segregants, as determined by a projection calculation of varying importance. This model captures the metabolic changes produced by drought stress across a range of genotypes and environments, which are illustrated in Figure 5.
[0092] The model, which was obtained using the phenotypic and metabolic data of the null segregants, was then applied to the metabolic data of the positive transgene segregants to predict whether the plants exhibit a restricted water phenotype or a well-irrigated phenotype. For some positive transgene segregants, their predicted class was statistically separated from null segregants towards the well-watered metabolome. As shown in Figure 6, the left half of the figure shows the predictions for the null segregants used to make the model. The right half of the figure contains the predictions of the positive segregants. The prediction of the significant numeric class - represented for each of the seven events classified with the PLSDA model is shown in Table I. Transgenic events whose metabolic profiles have been significantly altered in the sense of null segregants as well as water are highlighted in gray. The events highlighted in gray also had significantly different phenotypes, including, but not limited to, the increase in plant biomass. Table I. The predictions of the numerically-represented classes that are statistically different for the drought test are given for seven events in Figure 6.
Example 3 - Qualitative prediction of transgene responses
[0093] In the large-scale assay of transgenic hybrids that express a gene of interest, a high yield of the preferred phenotype was observed. Twenty-two hybrids that express the gene of interest were planted in Chile, in a field trial. Hybrids of the same genotype, with cells from different genes were also included to provide metabolic contrasts. Based on extensive product testing, hybrids were classified according to the observation of yield effects.
[0094] A PLSDA model was calculated using a genotype with the gene of interest incorporated in the hybrid from each parent. In the Chile experiment, one of these common hybrids exhibited a high productivity phenotype, while the other did not. The classes in this PLSDA model were designated high-performance and others. The model was improved through the selection of variables, using a genetic algorithm and the other hybrids, as a validation set. Using repeat predictions, a high yield phenotype probability was calculated from the distribution of the predictions compared to the hybrid calibration predictions. Table II contains the estimated metabolome probability if you observe the desired high yield phenotype. Hybrids, which are high-yielding, are indicated with a plus sign. Table II. A phenotype classifying the PLSDA model with metabolome input was built from a single genotypic base and was able to predict the observed high-performance phenotype in other genotypes. Positive phenotypes observed in large-scale tests are indicated with the plus sign.

权利要求:
Claims (29)
[0001]
1. Impartial method to predict the phenotype or trait of at least one independent plant, said method characterized by comprising: (a) characterizing the phenotypic and trace profiles of said, at least two groups of plants, in which said, at least at least, two groups of plants have different phenotypes or traits; (b) to establish for each group of plants a metabolic profile comprising the entire set of metabolites, which are detected and pre-processed, in which said establishment comprises: i) extracting metabolites from said at least two groups of plants ; ii) separating said metabolites by chromatography to generate a first data set; iii) detecting the mass-charge fragments produced by said metabolites using mass spectrometry to generate a second set of data; and iv) pre-processing said first data set and said second data set to align, reduce noise and dimensionality and normalize; (c) use the pre-processed data from step (b) to construct a multivariate calibration of the partial least squares to predict quantitative results; (d) select latent variables by the validation or cross-validation of step (c); (e) establish a metabolic profile of at least one independent plant; (f) compare the metabolic profile of (e) with the calibration of (c) to predict the expression of the phenotype or trait of at least one independent plant; and (g) cultivate the independent plant expected to express the phenotype or trait.
[0002]
2. Method according to claim 1, characterized by the fact that it further comprises: selecting said at least one independent plant identified by the method as defined in claim 1, which is expected to express said phenotype or trait of interest.
[0003]
Method according to claim 2, characterized by the fact that at least one independent plant comprises at least one transgene.
[0004]
Method according to any one of claims 1 to 3, characterized by the fact that said method of constructing multivariate calibrations of partial least squares further comprises the use of discriminant analysis of partial least squares.
[0005]
Method according to any one of claims 1 to 4, characterized by the fact that outliers in said method are identified using principal component analysis and cross-validation.
[0006]
Method according to any one of claims 1 to 5, characterized in that said method is established using support vector machines.
[0007]
Method according to any one of claims 1 to 6, characterized in that said method is established using neural networks.
[0008]
Method according to any one of claims 1 to 7, characterized by the fact that the variable importance for the projection calculations is used to estimate the importance of said metabolites in said method.
[0009]
Method according to any one of claims 1 to 8, characterized in that the separation of said metabolites by chromatography is carried out using gas chromatography.
[0010]
Method according to any one of claims 1 to 9, characterized in that said metabolites are detected by mass spectrometry using a time-of-flight mass spectrometer.
[0011]
11. Method according to claim 10, characterized by the fact that said pre-processing of said first data set and said second data set to reduce noise and dimensionality comprises: (a) adjusting the mass fragments- load for a common time grid; (b) reduce noise and dimensionality using statistical analysis, in which the said statistical analysis includes smoothing, noise subtraction or limitation; (c) align the retention times or mass-load fragments or retention indices using a local displacement function; (d) filtering mass-load fragments x retention times or index combinations using limiting and consistency functions; and (e) normalizing said mass-load fragments x retention times or index intensities for internal standard mass-load intensity and for dry sample weight.
[0012]
12. Method according to claim 11, characterized by the fact that it further comprises the steps of: (a) establishing the specific retention time or retention index windows; (b) determining a correlation between said mass-charge fragments identified within said specific retention time or retention index windows; (c) calculating a Pearson correlation coefficient matrix for said mass-charge fragments; (d) group said mass-charge fragments using a method of agglomeration of the nearest neighbor K, in which groupings are made when a calculated neighbor distance is less than 1, and in which said cluster needs more than 5 mass-charge fragments; (e) eliminating mass-charge fragments that are not within said calculated neighboring distance from said cluster; and (f) selecting said mass-charge fragments that have a higher frequency of a maximum within each of said sets to represent each said set in said method.
[0013]
Method according to any one of claims 1 to 12, characterized in that said at least two groups of plants are grown under precision growing conditions.
[0014]
Method according to any one of claims 1 to 12, characterized in that said at least one independent plant is grown under precision growing conditions or under natural conditions.
[0015]
Method according to any one of claims 1 to 13, characterized by the fact that said at least two groups of plants have the same genetic basis.
[0016]
16. Method according to any one of claims 1 to 15, characterized in that said at least one independent plant has the same genetic basis as said at least two groups of plants.
[0017]
17. Method according to any one of claims 1 to 16, characterized in that said at least one independent plant has a different genetic base than said at least two groups of plants.
[0018]
Method according to any one of claims 1 to 17, characterized in that said at least one independent plant is grown under the same environmental conditions as said at least two groups of plants.
[0019]
19. Method according to any one of claims 1 to 17, characterized in that said at least one independent plant is grown under environmental conditions different from said at least two groups of plants.
[0020]
20. Method according to any one of claims 1 to 19, characterized in that said at least one independent plant is cultivated at the same time as said at least two groups of plants.
[0021]
21. Method according to any one of claims 1 to 20, characterized in that said at least one independent plant is grown at a different time than said at least two groups of plants.
[0022]
22. Method according to any one of claims 1 to 21, characterized in that said at least one independent plant is grown in the same location as said at least two groups of plants.
[0023]
23. Method according to any one of claims 1 to 22, characterized in that said at least one independent plant is grown in a different location from said at least two groups of plants.
[0024]
24. Method according to any one of claims 1 to 23, characterized in that said phenotypes or traits different from said two groups of plants are selected from the group consisting of plant growth, total plant area, biomass, dry weight aerial part, productivity, drag yield, nitrogen utilization efficiency, water use efficiency, pest resistance, disease resistance, transgene effects, response to chemical treatment, stress tolerance, gas exchange parameters, days to spike, days to fall, germination rate, relative maturity, lodging, ear height, flowering time, emergence stress rate, leaf senescence rate, canopy photosynthesis rate, silk appearance rate, anthesis interval spike and recurrent parental percentage.
[0025]
25. Method according to any one of claims 1 to 24, characterized in that said at least two groups of plants are grown under different environmental conditions.
[0026]
26. Method according to any one of claims 1 to 25, characterized in that the said different environmental conditions under which said at least two groups of plants are grown are selected from the group consisting of temperature, soil moisture, level nitrogen, insect pressure, disease pressure, soil type, pesticide treatment, herbicide treatment, day length, planting density, light intensity, light quality, no-till practice, day of planting, level of carbon dioxide and oxygen levels.
[0027]
27. Method according to any one of claims 1 to 26, characterized in that said at least two groups of plants, or said at least one independent plant, are monocots or dicots.
[0028]
28. Method according to claim 27, characterized by the fact that said monocots or dicots are corn, rice, barley, oats, millet, wheat, grass, soy, cotton, sunflower, safflower, Arabidopsis, tobacco, rapeseed, cane sugar, alfalfa, canola, clover, tomato, potato, cassava or sorghum.
[0029]
29. Method according to any one of claims 1 to 28, characterized in that said method comprises: (a) characterizing the phenotypic or trace profiles of said at least two groups of plants, in which said at least , two groups of plants have different phenotypes or traits; (b) to establish for each group of plants a metabolic profile comprising the entire set of metabolites, which are detected and pre-processed, in which said establishment comprises: i) extracting metabolites from said at least two groups of plants ; ii) separating said metabolites by chromatography to generate a first data set; iii) detecting the mass-to-charge fragments produced by said metabolites using mass spectrometry to generate a second set of data; and iv) pre-processing said first data set and said second data set to align, reduce noise and dimensionality and normalize, where said pre-processing still includes the use of a Pearson correlation coefficient matrix, clustering and a method of agglomeration of the nearest neighbor K to select the mass-charge fragments to be used in said method; (c) use the pre-processed data from step (b) to construct a multivariate calibration of the partial least squares or discriminant analysis of the partial least squares to predict quantitative results; (d) select latent variables by the validation or cross-validation of step (c); (e) establish a metabolic profile of at least one independent plant; (f) compare the metabolic profile of (e) with the calibration of (c) to predict the expression of the phenotype or trait of at least one independent plant; and (g) selecting said at least one independent plant which is expected to express said phenotype or trait of interest.
类似技术:
公开号 | 公开日 | 专利标题
BR112013012068B1|2020-12-01|impartial method to predict the phenotype or trait of at least one independent plant
AU2018200030A1|2018-01-25|Precision phenotyping using score space proximity analysis
AU2012323411B2|2017-12-07|Automatic detection of object pixels for hyperspectral analysis
Aliferis et al.2011|Metabolomics in pesticide research and development: review and future perspectives
EP1936370A1|2008-06-25|Determination and prediction of the expression of traits of plants from the metabolite profile as a biomarker
Hamzehzarghani et al.2008|Metabolite profiling coupled with statistical analyses for potential high-throughput screening of quantitative resistance to Fusarium head blight in wheat
Sutter et al.2011|Mining for treatment‐specific and general changes in target compounds and metabolic fingerprints in response to herbivory and phytohormones in Plantago lanceolata
Venkatesh et al.2016|Metabolomic assessment of key maize resources: GC-MS and NMR profiling of grain from B73 hybrids of the nested association mapping | founders and of geographically diverse landraces
Hafeez et al.2021|Creation and judicious application of a wheat resistance gene atlas
Collins et al.2019|21. Breeding Sweet Potato For Weevil Resistance: Future Outlook
Sakiroglu et al.2011|Variation in biomass yield, cell wall components, and agronomic traits in a broad range of diploid alfalfa accessions
Naseri et al.2009|Population density and spatial distribution pattern of Empoasca decipiens | on different bean species
US11249072B2|2022-02-15|Detection and quantification of polypeptides in plants without a reference standard by mass spectrometry
Hall et al.2005|Metabolomics for the assessment of functional diversity and quality traits in plants
Mehrabi et al.2020|Genome-wide association analysis of root system architecture features and agronomic traits in durum wheat
Ward et al.2008|Plant metabolomics applications in the Brassicaceae: added value for science and industry
Omena-Garcia et al.2019|Identification of metabolite traits from the current metabolomic approaches
Bado et al.2016|Prediction of salt tolerance in rice | based on shoot ion content under non-stressed conditions
Nadaleti et al.2021|Selection Strategy For The Beverage Sensory Characterization In A Large Arabica Coffee Germplasm Bank
Mochida et al.2008|Metabolic phenotyping of genetically diverged species in Gramineae
Mendoza1991|21. Breeding Sweet Potato For Weevil
Hall et al.2006|FUNCTIONAL DIVERSITY AND QUALITY TRAITS IN PLANTS
同族专利:
公开号 | 公开日
US20120119080A1|2012-05-17|
US9465911B2|2016-10-11|
ES2865728T3|2021-10-15|
EP2641205A2|2013-09-25|
EP2641205B1|2021-03-17|
AU2011328963B2|2016-12-08|
AU2011328963A1|2013-05-30|
CA2817241C|2018-10-02|
CA2817241A1|2012-05-24|
AR083897A1|2013-04-10|
WO2012068217A2|2012-05-24|
BR112013012068A2|2016-08-09|
CL2013001399A1|2014-02-21|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

ES2171391T3|1990-04-26|2002-09-16|Aventis Cropscience Nv|NEW BACILLUS THURINGIENSIS CEPA AND ITS GENE OF INSECTICIATED TOXIN CODING.|
US5277905A|1991-01-16|1994-01-11|Mycogen Corporation|Coleopteran-active bacillus thuringiensis isolate|
DE69227911T2|1991-08-02|1999-05-12|Mycogen Corp|NEW MICROORGANISM AND INSECTICIDE|
EP0625006A4|1992-11-20|1996-04-24|Agracetus|Transgenic cotton plants producing heterologous bioplastic.|
AU6162294A|1993-01-13|1994-08-15|Pioneer Hi-Bred International, Inc.|High lysine derivatives of alpha-hordothionin|
US5583210A|1993-03-18|1996-12-10|Pioneer Hi-Bred International, Inc.|Methods and compositions for controlling plant development|
US5593881A|1994-05-06|1997-01-14|Mycogen Corporation|Bacillus thuringiensis delta-endotoxin|
US5792931A|1994-08-12|1998-08-11|Pioneer Hi-Bred International, Inc.|Fumonisin detoxification compositions and methods|
CA2160529A1|1994-10-14|1996-04-15|Toshihiko Iizuka|Bacillus strain and harmful organism controlling agents|
MX9709352A|1995-06-02|1998-02-28|Pioneer Hi Bred Int|HIGH METHIONINE DERIVATIVES OF 'alpha'-HORDOTHIONIN.|
EP0828835A1|1995-06-02|1998-03-18|Pioneer Hi-Bred International, Inc.|HIGH THREONINE DERIVATIVES OF $g-HORDOTHIONIN|
US5703049A|1996-02-29|1997-12-30|Pioneer Hi-Bred Int'l, Inc.|High methionine derivatives of α-hordothionin for pathogen-control|
US5850016A|1996-03-20|1998-12-15|Pioneer Hi-Bred International, Inc.|Alteration of amino acid compositions in seeds|
CA2270289C|1996-11-01|2005-09-27|Pioneer Hi-Bred International, Inc.|Proteins with enhanced levels of essential amino acids|
GB9717926D0|1997-08-22|1997-10-29|Micromass Ltd|Methods and apparatus for tandem mass spectrometry|
WO1999038190A2|1998-01-23|1999-07-29|Micromass Limited|Time of flight mass spectrometer and dual gain detector therefor|
US7805388B2|1998-05-01|2010-09-28|Health Discovery Corporation|Method for feature selection in a support vector machine using feature ranking|
US7612255B2|1999-02-03|2009-11-03|Jonathan Gressel|Transgenic plants for mitigating introgression of genetically engineered genetic traits|
IL144657D0|1999-02-11|2002-06-30|Maxygen Inc|High throughput mass spectrometry|
US7462481B2|2000-10-30|2008-12-09|Verdia, Inc.|Glyphosate N-acetyltransferase genes|
AU2002233310A1|2001-01-18|2002-07-30|Basf Aktiengesellschaft|Method for metabolic profiling|
US6896660B2|2001-06-19|2005-05-24|University Of Southern California|Therapeutic decisions systems and method using stochastic techniques|
US6873914B2|2001-11-21|2005-03-29|Icoria, Inc.|Methods and systems for analyzing complex biological systems|
US7747391B2|2002-03-01|2010-06-29|Maxygen, Inc.|Methods, systems, and software for identifying functional biomolecules|
WO2003092360A2|2002-04-30|2003-11-13|Verdia, Inc.|Novel glyphosate-n-acetyltransferase genes|
CA2501003C|2004-04-23|2009-05-19|F. Hoffmann-La Roche Ag|Sample analysis to provide characterization data|
AU2006268776B2|2005-07-08|2011-07-14|Metanomics Gmbh|System and method for characterizing a chemical sample|
EP1910959A1|2005-07-25|2008-04-16|Metanomics GmbH|Means and methods for analyzing a sample by means of chromatography-mass spectrometry|
EP1949054A1|2005-11-16|2008-07-30|ChemoMetec A/S|Determination of chemical or physical properties of sample or component of a sample|
WO2007118215A2|2006-04-06|2007-10-18|Monsanto Technology Llc|Method for multivariate analysis in predicting a trait of interest|
CA2988382A1|2006-11-15|2008-05-22|Agrigenetics, Inc.|Generation of plants with altered protein, fiber, or oil content|
EP1936370A1|2006-12-22|2008-06-25|Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V.|Determination and prediction of the expression of traits of plants from the metabolite profile as a biomarker|
AR076873A1|2009-05-14|2011-07-13|Pioneer Hi Bred Int|REVERSE MODELING FOR THE PREDICTION OF CHARACTERISTICS FROM MULTI-SPECTRAL AND HYPER-SPECTRAL DATA SETS DETECTED REMOTELY|
US8835361B2|2010-06-01|2014-09-16|The Curators Of The University Of Missouri|High-throughput quantitation of crop seed proteins|
ES2865728T3|2010-11-17|2021-10-15|Pioneer Hi Bred Int|Prediction of phenotypes and traits based on the metabolome|ES2865728T3|2010-11-17|2021-10-15|Pioneer Hi Bred Int|Prediction of phenotypes and traits based on the metabolome|
CN102930158B|2012-10-31|2016-01-20|哈尔滨工业大学|Based on the Variable Selection of offset minimum binary|
DE102013200058B3|2013-01-04|2014-06-26|Siemens Aktiengesellschaft|Automated evaluation of the raw data of an MR spectrum|
US10438581B2|2013-07-31|2019-10-08|Google Llc|Speech recognition using neural networks|
CN106338569A|2016-07-29|2017-01-18|云南省烟草农业科学研究院|Gas chromatographic mass spectrometry-based tobacco stem metabonomics analysis method|
CN106018627A|2016-07-29|2016-10-12|云南省烟草农业科学研究院|Metabonomics analytical method for tobacco pollen based on chromatography-mass spectrometry|
CN106018626A|2016-07-29|2016-10-12|云南省烟草农业科学研究院|Tobacco stigma metabonomic analysis method based on gas chromatography-mass spectrometry|
CN106018654A|2016-07-29|2016-10-12|云南省烟草农业科学研究院|Tobacco column metabonomics analysis method based on gas chromatographic mass spectrometry|
CN106404971B|2016-11-29|2018-07-03|河南工业大学|The method that gas chromatography identifies rice processing accuracy|
CN106897913B|2017-01-22|2020-10-27|华南理工大学|Accurate type selection method for injection molding machine|
US20180239866A1|2017-02-21|2018-08-23|International Business Machines Corporation|Prediction of genetic trait expression using data analytics|
CN109283278B|2018-11-30|2021-10-01|北京林业大学|Method for simultaneously measuring oil content and fatty acid of micro-oil tea seed kernels|
EP3711478A1|2019-03-21|2020-09-23|Basf Se|Method for predicting yield loss of a crop plant|
WO2020188114A1|2019-03-21|2020-09-24|Basf Se|Method for predicting yield performance of a crop plant|
CN113748337A|2019-04-16|2021-12-03|花王株式会社|Soybean yield prediction method|
WO2021252514A1|2020-06-09|2021-12-16|Zymergen Inc.|Metabolite fingerprinting|
法律状态:
2019-06-11| B06T| Formal requirements before examination [chapter 6.20 patent gazette]|
2020-07-21| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2020-12-01| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 16/11/2011, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US41464510P| true| 2010-11-17|2010-11-17|
US61/414,645|2010-11-17|
PCT/US2011/060936|WO2012068217A2|2010-11-17|2011-11-16|Prediction of phenotypes and traits based on the metabolome|
[返回顶部]